BitNet: Scaling 1-bit Transformers for Large Language Models